BITS Meetings' Virtual Library

BITS Meetings' Virtual Library:
Abstracts from Italian Bioinformatics Meetings from 1999 to 2013

766 abstracts overall from 11 distinct proceedings

Display Abstracts | Brief :: Order by Meeting | First Author Name

1. Anselmi C, Bocchinfuso G, Scipioni A, De Santis P
Identification of Protein Domains on Topological Basis
Meeting: BIOCOMP 2000 - Year: 2000
Full text in a new tab
Topic: Proteins analysis and structure prediction

Abstract: A theoretical method is proposed to identify structural domains in proteins of known structures. It is based on the distribution of the local axes of the polypeptide chain. In particular, a statistical analysis is applied to the contributions of the local axes to the absolute writhing number, a topological property of a space curve resulting from the number of self- crossings in the curve projections onto a unit sphere. This finding supports the hypothesis that topological requirements should be satisfied in the process of protein folding and in the final organization of the tertiary structures.

2. Attimonelli M, Lanave C, Pesole G, Liuni S, D'Elia D, Catalano D, Licciulli F, Grillo G, De Robertis M, Pasimeni R, Saccone C
MitBASE, AMmtDB e MitoNuc, un pool di banche dati specializzate MITOCONDRIALI.
Meeting: BIOCOMP 2000 - Year: 2000
Full text in a new tab
Topic: Databanks

Abstract: Nell'ultimo ventennio abbiamo assistito a due grandi rivoluzioni tecnologiche, lo sviluppo delle tecniche del DNA ricombinante e lo sviluppo delle Tecnologie informatiche. I metodi di sequenziamento sempre più avanzati hanno reso disponibili una grande quantità di dati ma la loro utilità è strettamente correlata alla disponibilità di strumenti informatici che ne consentano l'immagazzinamento e la catalogazione razionale allo scopo di consentirne l'analisi. Tutto ciò ha fatto nascere la neccessità di creare banche dati specializzate. MitBASE, AMmtDB e MitoNuc sono tre banche dati specializzate mitocondriali sviluppate dal gruppo di bioinformatica di Bari. MitBASE è una banca dati che raccoglie in maniera integrata sequenze di DNA mitocondriale di differenti organismi. La sua realizzazione è stata possibile grazie alla collaborazione tra sette differenti gruppi di ricerca europei ciascuno dei quali si è occupato della raccolta e della codifica dei dati relativi ad uno specifico gruppo di organismi (uomo, vertebrati, invertebrati, protisti, funghi, piante ed alghe). Le sequenze nucleotidiche e le loro eventuali varianti, raccolte dalle banche dati primarie e dalla letteratura, relative ai diversi organismi sono state poi arricchite con informazioni aggiuntive di carattere specifico per ciascun nodo. Il gruppo di ricerca di Bari si è occupato della strutturazione e della codifica dei dati relativi a varianti del DNA mitocondriale di uomo e di altri vertebrati con particolare attenzione ai dati inerenti a studi di genetica di popolazioni umane e a studi correlati alle patologie mitocondriali. Un nodo supplementare è stato inoltre sviluppato per raccogliere sequenze di geni nucleari del lievito Saccharomyces cerevisiae coinvolti nella biogenesi mitocondriale. Il database è disponibile al seguente indirizzo: http://www3.ebi.ac.uk/Research/Mitbase/mitbase.pl. AMmtDB è invece una banca dati costituita da una collezione di sequenze multiallineate di geni mitocondriali di vertebrati e invertebrati. Le sequenze multiallineate si riferiscono a geni che codificano per proteine e tRNA. Sono presenti inoltre anche multiallineamenti della regione del D-loop dei mammiferi. Tutti i dati sono stati strutturati per essere interrogati mediante il sistema di retrieval SRS all'indirizzo: http://bio-www.ba.cnr.it:8000/BioWWW/#AMMTDB. MitoNuc è una banca dati specializzata di geni nucleari di Metazoi coinvolti nella biogenesi dei mitocondri. Le informazioni relative a ciascun gene riguardanti ad esempio la localizzazione submitocondriale del prodotto, la sua eventuale tessuto specificità, il peptide segnale, le regioni 5' e 3' UTR dell'mRNA, sono strutturate in modo tale da consentire un efficace retrieval. Tale banca dati potrà essere proficuamente utilizzata per lo studio delle proprietà strutturali e funzionali dei geni nucleari codificanti per proteine mitocondriali, dei loro prodotti e delle interazioni tra il sistema genetico nucleare e quello mitocondriale. La banca dati è disponibile all'indirizzo: http://bio-www.ba.cnr.it:8000/srs6/

3. Bersani E, Aluffi-Pentini F, De Fonzo V, Parisi V
La cellula virtuale: dalla genomica alla proteomica. Reti metaboliche e reti proteiche.
Meeting: BIOCOMP 2000 - Year: 2000
Full text in a new tab
Topic: Others

Abstract: Il Progetto Genoma, finalizzato al sequenziamento dell'intero genoma umano, sta fornendo delle utili indicazioni sulla struttura e la regolazione del DNA umano e permettera' la completa mappatura di tutti i geni e la loro distribuzione sui singoli cromosomi. Per comprendere tuttavia la funzionalita' del genoma, sara' necessario individuare le proteine codificate e comprendere le loro reti di interazioni nei vari compartimenti cellulari per completare la comprensione del ciclo vitale della cellula e poterne capire a fondo i malfunzionamenti legati a disfunzioni metaboliche e ai tumori. Le reti piu' semplici da studiare sono quelle metaboliche, mentre le reti di interazione tra enzimi sono molto piu' intricate. Il nostro gruppo ha cominciato uno studio finalizzato alla implementazione della cellula virtuale. Il primo passo di questo studio e' stata la simulazione di una cascata di chinasi/fosfatasi in interazione (BIOCOMP1999), limitandosi a problemi adimensionali, ossia trascurando lo spazio (trasporto e diffusione). Queste cascate di chinasi/fosfatasi fanno sì che specifici segnali provenienti da un recettore di membrana vengano trasdotti verso il nucleo. I risultati ottenuti con questa simulazione (in termini di variabili dinamiche che quantificano la concentrazione dei singoli enzimi considerati, inattivi o attivati) sono promettenti e stiamo estendendo i modelli per includere oltre alla rete metabolica, altre interazioni proteiche e la trasduzione del segnale all'interno del nucleo cellulare. In un prossimo futuro prenderemo in considerazione anche la struttura tridimensionale per studiare fenomeni quali quelli legati alle onde e alle oscillazione del calcio. La costruzione di questa "cellula virtuale", permettera' di studiare fenomeni, quali l'apoptosi e il ciclo cellulare, per cercare di comprendere l'insorgenza di malattie neurodegenerative e di tumori. Il modello esteso permette di includere i fattori di trascrizione che determinano l'espressione genica specifica del tipo cellulare e di capire come l'attivita' all'interno del nucleo influenzi la funzionalita' del genoma umano al di la' della sua semplice suddivisione in regioni codificanti e non codificanti.

4. Bordo D, Spallarossa A, Hangyi I, Ranise A, Bolognesi M
The phosphoryl transfer reaction in the bacterial PTS. Model of the HPr~P~IIANtr protein complex
Meeting: BIOCOMP 2000 - Year: 2000
Full text in a new tab
Topic: Proteins analysis and structure prediction

Abstract: The histidine-containing protein HPr plays a central role in the phosphotransfer reaction that, in the bacterial phosphoenolpyruvate:sugar phosphotransferase system, leads to the phosphorylation of specific carbohydrates at the time of their translocation across the bacterial membrane. In Escherichia coli HPr is also able to phosphorylate the protein IIANtr, which is encoded by the rpoN operon and is involved in nitrogen regulation in bacteria. As the phosphoryl transfer reaction occurs concomitantly with the formation of a transient complex between HPr and IIANtr, the model of the P~HPr in complex with IIANtr (HPr~P~IIANtr) was built by in silico analysis. The model obtained is fully compatible with data describing the NMR chemical shifts relative to the interaction between HPr and IIAMtl, a protein which is structurally similar to IIANtr. The model shows that, due to good surface complementarity of the two proteins, intermolecular hydrogen bonds are formed by the invariant amino acids Arg17 of HPr and Arg57 of IIANtr. Other intermolecular interactions have hydrophobic character.

5. Bortoluzzi S, D'Alessi F, Romualdi C, Danieli GA
The human adult skeletal muscle transcriptional profile reconstructed by a novel computational approach
Meeting: BIOCOMP 2000 - Year: 2000
Full text in a new tab
Topic: Sequence analysis

Abstract: In human genomics, the high throughput analysis of general databases promises to provide relevant information and to produce novel biolgical knowledge. This strategy might be applied as well to the reconstruction of transcriptional profiles of different tissues or of the same tissue in different developmental stages (Bortoluzzi and Danieli, 1998). We developed a novel software tool, in order to mine the UniGene database, to retrieve extended datasets and to merge them. By applying this tool, information on 4,080 UniGene clusters were retrieved, belonging to three adult human skeletal muscle cDNA libraries, selected for being unnormalised or unsubtracted. The software processed the resulting records, which were sorted out according to specific criteria. In particular, for the present work, the field 'ESTs SEQUENCES' was considered. For each entry, the number of ESTs obtained from skeletal muscle cDNA libraries was annotated. If additional ESTs, obtained from different tissues, were reported in the record, their presence was also automatically annotated by the program, to produce information on the expression of the corresponding gene in different human tissues. The level of expression of each gene was estimated as a percentage of the total transcriptional activity, by computing the number of the skeletal muscle ESTs corresponding to a given entry, over the total number of skeletal muscle ESTs reported for all the entries included in the catalogue. The basic assumption is that the number of detected ESTs per gene is a function of the transcript frequency in the population of mRNAs. The expressed genes were classified according to their different levels of expression. Only 10% of genes expressed in muscle resulted transcribed at high level. They contribute to more than 50% of the total transcriptional activity of the tissue. The large majority of genes expressed in the adult skeletal muscle appear to be active also in several different tissues. Most skeletal muscle genes were found in at least one additional tissue and about 89% of them in more than 4 additional tissues. Forty-seven entries (1.2% of the total) were found only in cDNA libraries obtained from human skeletal muscle. The validation of this "in silico" approach was attempted by a comparison with SAGE data on genes expressed in skeletal muscle, reported in the Rochester SAGE catalogue (Welle et al, 1999), release July 1999. We considered the expression data concerning the 295 tags corresponding to fully annotated genes highly expressed in skeletal muscle. The reciprocal correspondence between 120 genes belonging to both catalogues was established, by using the GenBank ID and the levels of expression of each gene were compared. The results obtained by the two methods showed a statistically significant concordance (Bortoluzzi et al., 2000). Transcriptional profiles of the adult human skeletal muscle and of the adult human retina, produced in our laboratory, are published at the dedicated web site GETProfiles (http://telethon.bio.unipd.it/GETProfiles/index.html).

6. Bosotti R, Isacchi A, Sonnhammer ELL
A novel N-terminal domain in PIK-related kinases: the FAT domain
Meeting: BIOCOMP 2000 - Year: 2000
Full text in a new tab
Topic: Modelling

Abstract: Phosphatidylinositol kinases are found in all eukaryotes and serve important functions in phosphatidyl-inositol (PI) signaling pathways. Recently, a new subfamily of the PI kinase superfamily involved in meiotic and V(D)J recombination, chromosome maintenance and repair, cell cycle progression and cell cycle checkpoint has emerged, called PIK-related. This family includes ATM, ATR, DNA-PK, ESR1, Rad3, TOR1, TOR2, FRAP, TEL1 kinases. These are large proteins (2000-4000 aa) that only share similarity in the ~300 aa kinase domain to classical PI kinases. Another group distantly related to PI kinases comprises the TRRAP proteins. They also share similarity to the PI kinase domain however they lack the catalytic residues and indeed none of them has been shown to possess kinase activity. It has previously been noted that the TRRAP and PIK-related proteins share a unique motif at the C terminus. Analysis of the remaining sequence has so far not been able to clearly define shared domains in the large N-terminal portions. We here describe a novel homology domain spanning ~500 aa, N-terminal to the PI kinase domain in the PIK-related and TRRAP subfamilies. We call this domain FAT after representatives of the three main groups sharing the domain (FRAP, ATM, and TRRAP). This domain is only present in the FRAP, ATM and TRRAP subfamilies, it is not found outside these subfamilies and always coexists with the C terminal domain previously identified. It is possible that they fold together in a configuration that is necessary for proper function of the PI kinase domain, which is wedged in between the FAT and the C-terminal domains.

7. Brannetti B, Via A, Montecchi-Palazzi L, Cesareni G, Helmer-Citterich M
SH3-SPOT: predizione sulla specificita' di riconoscimento dei domini SH3
Meeting: BIOCOMP 2000 - Year: 2000
Full text in a new tab
Topic:

Abstract: Abbiamo utilizzato SH3-SPOT per effettuare predizioni tra singole sequenze di SH3 e liste di peptidi, proteine o database di proteine (nrl_3d e swissprot). Le predizioni sono in buon accordo con i dati sperimentali ove questi siano disponibili. Il metodo puo' essere utilizzato per predire la specificita' di qualunque proteina di cui siano disponibili: 1) la struttura di almeno un complesso proteina/peptide o proteina/proteina; 2) dati sperimentali di interazione tra proteina e liste di peptidi. Nel caso del dominio SH3, abbiamo attualmente a disposizione 8 strutture di complessi e circa 300 peptidi che si legano ad una ventina di domini SH3. Il potere predittivo del metodo potrebbe aumentare col crescere del database delle strutture o dei dati di phage display. Stiamo attualmente migliorando il programma con il calcolo dell'entropia per "pesare" i residui dell'interfaccia e con la valutazione della lunghezza delle catene laterali dei residui coinvolti nell'interazione nel calcolo della frequenza dei contatti residuo-residuo.Il nuovo programma SPOT verra' applicato all'analisi della specificita' delle molecole MHC e allo studio dell'interazione DNA-proteine. Barbara Brannetti, Allegra Via, Gianluca Cestra, Gianni Cesareni e Manuela Helmer Citterich (2000). SH3-SPOT: an algorithm to predict preferred ligands of different members of the SH3 gene family. JMB in press.

8. Calogero RA, Iazzetti G
PRO2INS: un database per l'annotazione di interazioni proteina-proteina
Meeting: BIOCOMP 2000 - Year: 2000
Full text in a new tab
Topic: Databanks

Abstract: Negli ultimi anni, tecniche quali il "yeast two hybrid system" hanno permesso la produzione di una notevole mole di dati riguardanti le interazioni proteina-proteina, aprendo di fatto la via alla comprensione dell'intricata rete d'interazioni proteiche che regolano le funzioni della cellula. Pur essendo disponibili un notevole numero di banche dati dedicate alle proteine ed ai domini strutturali proteici, le interazioni proteina-proteina sono raramente annotate (ad es.: domini d'interazione proteina-proteina caratterizzati strutturalmente) in questi databases. Quindi, non essendo i dati d'interazione tra proteine raggruppati in alcun modo, risulta alquanto tedioso risalire all'identificazione dei possibili pathways d'interazione tra piu' proteine. Il database PRO2INS (PROtein-PROtein INteractionS) nasce come punto di raccolta dei dati d'interazione tra proteine ed e' stato costruito utilizzando prevalentemente dati di letteratura (MEDLINE). PRO2INS e' stato sviluppato sfruttando le potenzialita' offerte dal linguaggio VRML 2.0 nella costruzione di mondi virtuali. In particolar modo il VRML 2.0 ha permesso di creare una rete tridimensionale dove le proteine sono rappresentate dai punti di giunzione (nodi) tra i filamenti della rete che rappresentano invece le interazioni proteina-proteina. I nodi (proteine) sono rappresentati da cilindri, le cui dimensioni sono proporzionali alla lunghezza della proteina. All'interno dei cilindri (nodi), i domini d'interazione con altre proteine sono rappresentati da fasce colorate da cui si originano i filamenti (interazioni proteina-proteina). PRO2INS al momento contiene piu' di 100 proteine, per le quali e' stata dimostrata sperimentalmente un'interazione.

9. Cannata N, Dioguardi R, Fontana P, Scannapieco P, Toppo S, Lanfranchi G, Valle G
An integrated knowledge-base of gene expression in human skeletal muscle
Meeting: BIOCOMP 2000 - Year: 2000
Full text in a new tab
Topic: Databanks

Abstract: We have build a solid scaffolding that can hold and connect muscle transcript sequencing data to functional data, expression profiles, genomic sequences and genetic diseases. The starting point is the wide collection of skeletal muscle ESTs produced at CRIBI, which are automatically analysed, filtered and stored in a SQL table (HSPD-EST). A schematic view of the organization of the data is shown in the figure. ESTs are assembled into clusters (HSPD-CLUSTER table), which are very transitory entities as they may change at every new assembly depending on the order that the ESTs were merged or on the presence of new variant isoforms determined by alternative splicing or paralogue genes. On the other hand, many transcripts have now been well characterised and therefore should be considered as stable entities. Therefore, we decided to implement a Transcript Integrated Table (TRAIT) of human skeletal muscle, that includes some of the established information that is already available. As can be seen in the figure, we have also implemented a Single-Transcript Integrated Table (STRAIT), where different transcripts are stored in different records, even if they come from the same gene, for instance after alternative splicing. Therefore, every single transcript is recorded in STRAIT, while TRAIT is used to link together those transcripts that originated from the same gene. When a new cluster is discovered, then a provisional STRAIT record is automatically created. Records become permanent after the addition of further information such as full length sequencing, functional studies and high density hybridisation experiments, which are currently performed in our laboratory. All the above information is organised under an SQL database management system, in a protected intranet environment, currently including more than 4,000 STRAIT records. All the tables are periodically translated into SRS databases and are accessible on the web at HYPERLINK "http://grup.bio.unipd.it/" . The full implementation of the other databases (shown in the figure in light blue) is currently under way. In particular, a series of scripts and automatic procedures have been developed, linking full and partial transcripts to genomic sequences in view of the release of the entire human genome sequence. Our scripts make use of programs such as Blast, GeneFinder and Sim4, to perform this analysis systematically on every transcript of our database. The identification of the genomic sequence allows a simple and exact localisation of the genes and gives an indication of the full length sequence, introns, exons, alternative splicing and promoter region. Similar systematic procedures are also under way to link our muscle transcripts to sequences from model organisms such as yeast, C. elegans, Drosophila and mouse.

10. Castrignanò T, Chillemi G, Desideri A
Structure and Hydration of Bam HI DNA recognition site: a molecular dynamics investigation
Meeting: BIOCOMP 2000 - Year: 2000
Full text in a new tab
Topic: Proteins analysis and structure prediction

Abstract: The results of a 3 ns molecular dynamics simulation of the dodecamer duplex d(TATGGATCCATA)2 recognised by the Bam HI endonuclease are here presented. The DNA has been simulated as a flexible molecule using AMBER force field and the Ewald summation method which eliminates the undesired effects of truncation and permits to evaluate the full effects of electrostatic forces. The starting B conformation evolves toward a configuration quite close to that observed through X-ray diffraction in its complex with Bam HI. This configuration is fairly stable and the Watson-Crick hydrogen bonds are well maintained over the simulation trajectory. Hydration analysis indicates a preferential hydration for the phosphate than for the ester oxygens. Hydration shells in both the major and minor groove were observed. In both grooves the C-G pairs were found to be more hydrated than A-T pairs. The "spine of hydration" in the minor groove was clear. Water residence time are longer in the minor groove than in the major groove, although relatively short in both cases. No special long values are observed for sites where water molecules were observed by X-ray diffraction indicating that water molecules having an high probability to be located in a specific site are also fast exchanging.

11. Chiusano ML, Colonna G
From DNA to protein : comparative analyses to investigate structural relationships
Meeting: BIOCOMP 2000 - Year: 2000
Full text in a new tab
Topic:

Abstract: We developed a computational method to analyse a coding sequence considering all the information available for the sequence itself and its product. By a graphical analysis of the composition of the nucleic acid sequence, the corresponding amino acid sequence, its chemico-physical properties, the structural and functional information derived from the Swissprot database, the prediction of secondary structure derived from a consensus of five different predictive methods and the three-dimensional information derived from the DSSP program (when the experimental structure of the protein exists), it is possible to investigate and summarize the structural features of a protein from its coding sequence to its structure. The software allows the analysis of multiple alignments of sequences too. In this case, it is possible to perform a deeper analysis to infer on both functional an evolutionary information. Integrating the software with a tool which calculates the substitution rate for both synonymous and nonsynonymous positions , we defined different compositional and substituting behaviour for the secondary structures of 34 mammal coding sequences .

12. Chiusano ML, Colonna G
User friendly computational tools for the analyses of large nucleic acid sequences
Meeting: BIOCOMP 2000 - Year: 2000
Full text in a new tab
Topic:

Abstract: We propose a user-friendly computational bench work for large sequence analysis. Large scale sequencing is of great contribute to the amount of biological data concerning the structures and the functions of nucleic acid and protein macromolecules. This is the reason of the strong impact of this technique in Computational Biology. The interest for the development of computational tools for sequence characterisation, gives place to well representative examples of the necessity of efficient and exhaustive methods for the analyses. In particular, one of the most spreading computational challenges in the Biocomputing field is the search for biologically meaningful patterns and motifs, for instance the search for coding regions. The wide development of computational tools for coding regions analyses available on the net, while offering varied analytical approaches, often gives place to confusion about the reliability of the methods and to drawbacks caused by unapproachable software. Moreover, sequence dimensions, usually hundred thousands of nucleotides, can be a limit according to the computational power usually available in a laboratory and to the methods employed. These troubles may be complicated by the fact that often the software runs in remote sites.To overcome all these limits, trying to enhance the reliability and exhaustivity of the results searched for, we developed a set of computational tools, written in C language. By the methods available in the set, a user-driven manipulation of the sequence under analysis allows to overcome dimensional problems. The integration and the comparison of results obtained with software available on several remote sites in Internet seems to guarantee a certain accuracy in the analysis. The implementation of a graphical interface allows a direct interpretation of the displayed results that can be easily manipulated for further studies.This bench work can be considered an advantageous collection of tools and helpful in a useful manipulation, description and interpretation of integrated comparative analyses of large nucleic acid sequences.

13. Ciarapica R, Rosati J, Soucek L, Nasi S
Librerie di domini bHLH(Zip)
Meeting: BIOCOMP 2000 - Year: 2000
Full text in a new tab
Topic:

Abstract: I fattori di trascrizione bHLH e bHLHZip sono componenti essenziali dei programmi di crescita, differenziamento e morte cellulare, e pertanto molti hanno un ruolo nell’insorgenza del cancro. Sono contraddistinti da una struttura tridimensionale altamente conservata che comprende un dominio basico (b) di legame al DNA ed un dominio di dimerizzazione elica-ansa-elica (HLH), seguito da una cerniera di leucine (Zip) nelle bHLHZip. Queste proteine agiscono come dimeri e riconoscono sequenze di DNA, le E box, esameriche e palindromiche. La dimerizzazione è fondamentale per l'attivazione del programma genetico e la specificità di dimerizzazione determina la potenzialità funzionale dei dimeri. Studi di modellizzazione molecolare hanno mostrato la possibilità di modificare il riconoscimento molecolare dei dimeri tramite mutagenesi mirata di aminoacidi nella cerniera di leucine. E’ stato cosi’ ottenuto un dominio bHLHZip mutante in grado di interferire specificamente con l’attività biologica del gene Myc. Attraverso uno studio sistematico di strutture primarie, secondarie e terziarie dei numerosi domini di dimerizzazione bHLH e bHLHZip presenti in banche dati, abbiamo derivato i criteri per l'introduzione di mutazioni nelle posizioni ritenute critiche per la specificità di riconoscimento; abbiamo quindi ottenuto per PCR una varietà degenere di domini adatta ad essere rappresentata per phage display. Prima di costruire le librerie fagiche, abbiamo confrontato quattro diversi vettori al fine di identificare quello ottimale per la corretta esposizione dei domini. Tali domini sono stati fusi con le proteine di rivestimento pIII e pVIII di fagi filamentosi, e con la proteina D del capside di lambda. Il fago lambda è risultato il più idoneo, poichè è l'unico in cui è stato possibile esporre a buoni livelli l'intera regione bHLHZip, conservandone la capacità di dimerizzazione. Le librerie di domini bHLH(Zip) permetteranno di studiarne in vitro le proprietà di riconoscimento molecolare e di isolare sequenze varianti in grado di interferire specificamente con bHLHZip naturali e quindi di potenziale interesse farmacologico.

14. Ciccarelli FD, Alberti S
Structural features of the cysteine-rich region of the Trop molecules
Meeting: BIOCOMP 2000 - Year: 2000
Full text in a new tab
Topic: Modelling

Abstract: Trop-1 and Trop-2 are homologous transmembrane glycoproteins that are expressed at high levels by most human cancer cells. Trop-1 is induced by cell proliferation and oncogenic transformation, and is a marker of epithelial progenitor cells. Trop-2 is expressed by terminally differentiated epithelial cells and by the placental trophoblast. In the N-terminal region of both molecules there is a large cysteine-rich region (98 residues in Trop-1 and 111 in Trop-2) that is conserved in all the Trop molecules cloned so far. The last six cysteines of this region constitute a typical thyroglobulin domain, while the first six cysteines do not conform to any typical PROSITE pattern. Sequence analysis, secondary structure prediction, fold recognition indicate that an EGF-like domain can be recognized in the region involving the first six cysteines of the Trop molecules. Moreover, the results of sequence analysis show that this sequence module formed by an EGF domain and a thyroglobulin domain is also present in other proteins involved in cell adhesion and in growth control, and hence a role of such a module in these functions can be proposed. The Swiss-PDB-Viewer homology modelling program was employed to build a 3D model of the thyroglobulin domain of the Trop molecules. As a template, we used the thyroglobulin domain of the p41 protein. The RMSD between target and template structures is 1.36 Å. Moreover, 87% of the predicted 3D structure residues fall in the core region of the Ramachandran Plot and 100% in its acceptable region. We are working on the prediction of the 3D structure of the EGF region: joining the two models, a final prediction of the complete cysteine-rich region of the Trop molecules could be obtained.

15. De Fonzo V, Aluffi-Pentini F, Bersani E, Parisi V
Un nuovo algoritmo per la ricerca di tandem repeat nel genoma
Meeting: BIOCOMP 2000 - Year: 2000
Full text in a new tab
Topic:

Abstract: Il nostro gruppo si occupa da anni di studiare teoricamente i fenomeni legati a VNTR (variable number of tandem repeat); i risultati delle ricerche condotte ci hanno portato ad introdurre il concetto di "genetica dinamica" - "dynamical genetics" mutuato da Goldschmidt (1938) - per spiegare l'insorgenza di varie malattie e piu' in generale vari fenomeni legati alla dinamica del DNA [De Fonzo et al, 2000]. In particolare, rientrano nella genetica dinamica, oltre alle note malattie dovute a espansione di triplette (ad es. morbo di Huntington e sindrome da cromosoma X fragile [Sutherland & Richards, 1995]), anche altre dovute a non triplette (ad es. il diabete mellito e l'epilessia mioclonica [Virtaneva et al, 1997]) e fondamentali fenomeni quali i tumori, l'apoptosi e l'integrazione virale. Abbiamo, quindi, implementato un programma di bioinformatica in grado di identificare i TR (tandem repeat) e di rappresentarli in una maniera facilmente leggibile. In un articolo [De Fonzo et al, 1998] abbiamo introdotto i primi accenni di genetica dinamica e descritto una versione preliminare dell'algoritmo utilizzato basato su quelli noti di Needleman & Wunsch (1970) e Smith and Waterman (1981), a loro volta basati sulla programmazione dinamica (Bellman, 1957). Inoltre, abbiamo stabilito un criterio obiettivo per assegnare i punteggi di somiglianza tra le basi confrontate tra una ripetizione e l'altra. L'algoritmo e' stato implementato in linguaggio C e il software e' stato a lungo testato, con risultati soddisfacenti, su tutte le sequenze contenenti VNTR associate a malattie umane, disponibili nella banca dati GenBank. Stiamo per pubblicare una versione migliorata, che fornisce anche il consenso (limitazione principale del precedente) e in seguito produrremo una versione piu' potente in grado anche di ricostruire la storia delle successive mutazioni.

16. Facchiano AM, Colonna G, Di Gennaro S, Farisei F, Poerio E
Homology modelling strategy for prediction of the 3-D structure of a wheat protein inhibitor
Meeting: BIOCOMP 2000 - Year: 2000
Full text in a new tab
Topic: Modelling

Abstract: In order to investigate structural/functional relationships of a wheat subtilisin-chymotrypsin inhibitor (WSCI) (1), we decided to explore its secondary and tertiary structures by applying an homology modelling procedure (using the Modeller program as part of the Quanta package) (2). The barley chymotrypsin inhibitor CI-2A (3), which exhibits 89% sequence similarity with the wheat inhibitor, has been chosen as reference structure. The best model structure obtained for WSCI, shows that 50% of its amino acid sequence (72 residues) are involved in motifs of secondary structure; particularly, 11 amino acid residues give rise to an helix, 12 residues form a long loop connecting two parallel strands, each made up of 6 residues. In the spatial model, the relative positions of such motifs are in agreement with the experimental data obtained upon interaction between WSCI and subtilisin; in fact, the bacterial proteinase cleaves, specifically, the inhibitor peptide bond Met48-Glu49 located in the middle of the above connecting loop. The weak interactions observed (H-bonds and salt bridges) in WSCI model are in perfect agreement with those found in the reference structure of CI-2A. Investigations regarding the modality of interaction between WSCI and susceptible proteinases are in progress.

17. Facchiano AM, Ammirato G, Chiusano ML, Gallo T, Colonna G
A service for protein and nucleic acid analysis at the CRISCEB web site
Meeting: BIOCOMP 2000 - Year: 2000
Full text in a new tab
Topic: Services

Abstract: Modern biological sciences need informatic support to manage and analyze the large amount of structural and functional information about proteins and nucleic acids. The availability of databases and analysis tools, as well as the opportunity to have special purpose methods and software represents an undoubtful advantage for any research team. For this purpose, our group maintains a web site (http://crisceb.area.na.cnr.it) to support molecular biology research offering public domain bioinformatics software, databases, and developing computational methods for protein and nucleic acid analysis. The SRS - Sequence Retrieval System allows our users to query more than 70 databases. Public domain software are also available (CINEMA, WebMol, Phylodendron). Moreover, we are interested to develop new web tools to meet specific research problems. As an example, we have recently realized a tool to search for helix motifs in protein sequences. Another running project concerns the search for 3D similarities among protein structures. The aim of our work is to integrate all these tools, in order to create a real bioinformatics laboratory on the web.

18. Fanelli F, De Benedetti PG
Theoretical study on mutation-induced activation of GPCRS
Meeting: BIOCOMP 2000 - Year: 2000
Full text in a new tab
Topic: Modelling

Abstract: The rhodopsin family of receptors employs guanine nucleotide binding proteins (G proteins) to transduce signals across the cell membrane. All the G protein coupled receptors (GPCRs) share the presence of seven hydrophobic regions that are believed to form a bundle of a-helical transmembrane domains connected by alternating intracellular and extracellular hydrophilic loops. Three-dimensional model building and molecular dynamics (MD) simulations of the a1b-adrenergic receptor (a1b-AR), of the oxytocin receptor (OTR) and of the luteinizing hormone receptor (LHR) were employed to provide hypotheses about the molecular mechanisms underlying the mutation-induced activation of these GPCRs. The comparative analysis of the wild type receptors and of several constitutively active or inactive mutants was instrumental to infer the structural/dynamics features which could characterize the active and the inactive forms of these receptors. These features were also employed for predicting the functional behavior of new receptor mutants. Rigid body docking simulations between the functionally different forms of the a1b-AR and the LHR, on one hand, and heterotrimeric G proteins, on the other, suggested that the cytosolic crevice shared by the constitutively active receptor structures and formed by the second and the third intracellular loops as well as by the cytosolic extensions of helices 3, 5 and 6, might participate to receptor-G protein interface. The results of this study might provide a structural framework to interpret the pathological effects induced by naturally occurring mutations of the LHR. In addition, the theoretical models here proposed can be useful for designing new mutations or ligands able to modulate receptor function as well as to drive experiments aimed at exploring the receptor-G protein interface.

19. Fariselli P, Casadio R
The role of evolutionary information in predicting the disulfide-bonding state of cysteine in proteins
Meeting: BIOCOMP 2000 - Year: 2000
Full text in a new tab
Topic: Proteins analysis and structure prediction

Abstract: A neural network-based predictor is trained to distinguish the bonding states of cysteine in proteins starting from the residue chain. Training is performed using 2452 cysteine-containing segments extracted from 641 non homologous proteins of well resolved 3D structure. After a cross-validation procedure efficiency of the prediction scores as high as 72% when the predictor is trained using protein single sequences. The addition of evolutionary information in the form of multiple sequence alignment and a jury of neural networks increase the prediction efficiency up to 81%. Assessment of the goodness of the prediction with a reliability index indicates that more than 60% of the predictions have an accuracy level greater than 90%. A comparison with a statistical method previously described and tested on the same database shows that the neural network-based predictor is performing with the highest efficiency.

20. Ferrè F, Via A, Helmer-Citterich M
Motivi di riconoscimento e/o interazione con fosfoaminoacidi.
Meeting: BIOCOMP 2000 - Year: 2000
Full text in a new tab
Topic:

Abstract: Le protein chinasi (PK) sono una classe di enzimi coinvolti nella trasduzione del segnale in grado di fosforilare residui specifici di substrati specifici; per molti di essi sono note sequenze substrato consenso. In funzione del tipo di residuo fosforilabile, le PK si dividono in serina/treonina chinasi e tirosina kinasi; alcune PK, come le MAPKK, mostrano una doppia specificita'. Il nostro scopo e' di confrontare la superficie del sito attivo di un membro di ciascuna sottofamiglia di PK (seguendo la classificazione di SCOP) in modo da mettere in evidenza caratteristiche conservate. L' approccio seguito e' il metodo dei motivi 3D (de Rinaldis et al., JMB 284, 1211-1221, 1998) che permette di utilizzare allineamenti multipli di superfici proteiche per definire motivi tridimensionali associati ad una specifica funzione, che possono essere usati per ricerche in database di strutture proteiche. Questa metodologia consente di definire una ‘signature’ di superficie che caratterizzi la specificita' di legame delle due famiglie di PK, difficilmente rilevabile per mezzo di allineamenti di sequenza. Cio' permettera' inoltre di fare previsioni sulla base strutturale della specificita' di legame di PK per le quali i substrati non siano stati ancora caratterizzati. Inoltre sara' possibile definire un strategia sperimentale per confermare i nostri risultati, mediante delineazione di mutanti (allo scopo di cambiare la specificita' per il legame del substrato di una data PK ). Lo stesso approccio puo' essere utilizzato per confrontare la superficie delle protein fosfatasi, allo scopo di evidenziare residui conservati fra le serina/treonina fosfatasi e le tirosina fosfatasi, e fra protein fosfatasi e PK. La nostra ipotesi e' che tutte le proteine in grado di legare fosfoaminoacidi possano condividere delle similarita' di superficie; altri domini proteici noti per la loro abilita' nel riconoscere fosfoaminoacidi, come gli SH2 e i WW, saranno oggetto di studio nel corso del progetto. Nelle fasi iniziali del progetto, il confronto di un membro per ogni famiglia di PK ha messo in luce residui conservati sulla superficie; molti di essi sono noti per la loro importanza per quanto riguarda catalisi o riconoscimento del substrato. Alcuni sono comuni a tutte le proteine in esame, mentre altri sono specifici per le serina/treonina chinasi o per le tirosina chinasi. Allo stadio attuale stiamo esaminando le interazioni fra le PK e i loro substrati mediante lo studio delle strutture di enzimi co-cristallizati con il substrato o con un inibitore.

21. Filippini F, Rossi V, Picco R, Floriduz M, Naccari T, Carpi A, Budillon A, Vacca M, Gianfrancesco F, Ciccodicola A, D'Urso M
From comparative genomics to ‘molecular bioinformatics’: an integrated approach to functional genomics
Meeting: BIOCOMP 2000 - Year: 2000
Full text in a new tab
Topic: Others

Abstract: Despite several genomes have been largely or completely sequenced, we are still far from a 'whole proteome' functional characterization. Thousands of 'hypothetical proteins', 'unique' (i.e. not homologous to any other) gene products, and even partially characterized proteins hide unknown, possibly important functions. In the absence of experimental confirmation, sequence analysis alone never demonstrates a function, and simple homology search is even unable to infer function from sequence for about a third of the gene products in any of the sequenced genomes. On the other hand, a 'brute-force' experimental characterization of whole proteomes would be too expensive and time-consuming. In order to find an alternative to shotgun characterization of gene products with unknown function, we are following a "molecular bioinformatics" approach, combining step-by-step in silico analyses with in vitro and in vivo prediction-driven experiments. This allowed to unravel molecular mechanisms underlying function of rolB plant oncogene (F. Filippini et al., Nature 379:499-500, 1996; V. Rossi et al., ms. in preparation) and of bifunctional phospholipase-chitinase EP3 (R. Picco et al., ms. in preparation); further, work is in progress about characterization of the unique N-terminal domain of SYBL1 (M. D'Esposito et al., Nature Genetics, 227-230, 1996), the molecular significance of MeCP2 mutations in Rett syndrome patients, to attempt a genotype-phenotype correlation (Vacca et al et al. , ms. in preparation) and about comparative analysis of mammalian and plant adaptins and protein tyrosine kinases. Using comparative analysis of mammalian and plant genomes we are searching for evolutionarily conserved "sequence function tags" - such as motifs and even weak homology regions - in order to get suggestions for a 'targetted' experimental approach to the demonstration of function or for the identification of homologous elements. This approach is finally aimed to unravel possibly the deregulation of important functions underlying a genetic diseases or fundamental genes involved in plant and mammalian developmental control.

22. Jacoboni I, Casadio R
Predicting the structure of membrane porins
Meeting: BIOCOMP 2000 - Year: 2000
Full text in a new tab
Topic:

Abstract: Presently two types of membrane protein structures are known at atomic resolution: the alpha helical proteins of the cytoplasmic membrane and the beta barrel proteins of the bacterial outer membrane. The topography and topology of alpha helical transmembrane (TM) helices (H) can be predicted with good accuracy and several web servers are available for TMH prediction . However this is not so for the second type of membrane proteins due to the difficult task of predicting beta strands and particularly those of membrane proteins. Beta barrel structures are presently also believed to form the transbilayer pore of the voltage dependent anion channels in the outer membrane of mitochondria from different species. For these proteins, barely homologous to the outer membrane porins, several sequences are available and modeling is required. Moreover recently it has been recognized that other proteins of the outer membrane are also endowed with beta barrel structures, anchoring the protein to the membrane. It seems therefore that modeling of the beta barrel structures is as important as that of the TMHs. Modeling requires a correct prediction of the protein regions generating the beta barrel. We developed a neural network based predictor to locate putative beta strands adopting the TM beta barrel structure starting from the protein sequence. The predictor is trained and cross validated using the six porins presently known at atomic resolution in the PDB database. These proteins, albeit characterized by the same 3D structure, have very low sequence identity. Network outputs are then filtered with a Hidden Markov Model procedure to avoid spurious assignments and to select the beta barrel-forming beta strands. The predictor accuracy scores as high as 73% and its performance is higher than that previously obtained with other statistical and empirical methods.

23. Lavorgna G, Boncinelli E
Ricerca in banca dati di potenziali geni bersaglio di fattori trascrizionali
Meeting: BIOCOMP 2000 - Year: 2000
Full text in a new tab
Topic: Sequence analysis

Abstract: Uno dei problemi fondamentali della ricerca biologica e' la comprensione dei meccanismi fondamentali che permettono alle cellule differenziate, sebbene totipotenti, di poter selettivamente utilizzare solo una piccola e peculiare porzione del loro enorme potenziale genetico. Infatti, ognuna di esse e' caratterizzata da una particolare configurazione di geni attivi ed inattivi, che subiscono una stretta regolazione della loro espressione sia spaziale che temporale. Per es., le cellule della mano sono tali in quanto esprimono un 'set' di geni in larga misura differente da quello espresso da altri tipi cellulari, come, ad es., le cellule del cervello. Analogamente, le cellule neoplastiche presentano un caratteristico pannello d'espressione genica, che e' alterato rispetto alle cellule sane. Questo controllo dell'attivita' trascrizionale e' in larga misura affidato a delle proteine, i fattori trascrizionali (FT), che controllano, spesso in combinazione con altri FT, l'attivita' di svariate centinaia di geni. I FT esercitano la loro azione legandosi specificamente a piccole sequenze di DNA (i cosiddetti siti di legame), localizzate nelle sequenze regolatrici dei loro geni bersaglio, la cui attivita' viene, di consequenza, diminuita o intensificata. Un metodo per poter analizzare il meccanismo di azione di un FT consiste quindi nella identificazione dei suoi geni-bersaglio. Purtroppo, questa procedura si rileva spesso complessa e costosa da un punto di vista sperimentale e non e' sempre coronata da successo. Un attraente approccio alternativo al problema consiste nel cercare i potenziali siti di legame dei FT direttamente negli enormi databases di sequenze di DNA accumulatisi nel corso degli anni. La natura elusiva e degenerata dei siti di legame rende comunque quest'approccio poco attuabile. Una ricerca nei databases per queste piccole sequenze porta, infatti, a dei risultati in cui la predominanza del 'rumore di fondo' (cioe' l'elevato numero di siti non biologicamente rilevanti) renderebbe virtualmente impossibile la formulazione di ipotesi testabili poi sperimentalmente. Abbiamo quindi sviluppato un software, TargetFinder, che fosse in grado di diminuire notevolmente questo rumore di fondo, preservando allo stesso tempo i reali geni-bersaglio individuati nella ricerca in banca dati. L'euristica utilizzata dal programma si basa sull'incorporazione nella ricerca di semplice considerazioni che tengono conto del contesto biologico ove la ricerca stessa viene effettuata. Sara' illustrato come TargetFinder e' stato utilizzato dal nostro laboratorio per la ricerca di potenziali geni bersaglio del fattore trascrizionale Otx2 murino.

24. Liuni S, Attimonelli M, Pesole G, Lanave C, Grillo G, Licciulli F, D'Elia D, Catalano D, Saccone C
European Molecular Biology Network (EMBnet): Nodo Nazionale Italiano
Meeting: BIOCOMP 2000 - Year: 2000
Full text in a new tab
Topic: Services

Abstract: La crescita parallela delle tecnologie Informatiche e delle Telecomunicazioni ha sin dalla seconda metà degli anni ‘80 favorito la crescita delle reti Bioinformatiche. La grande quantità di dati, da una parte, e il gran numero di ricercatori interessati a consultare le banche dati biologiche e a svolgere analisi sui dati in esse contenute, dall’altra, ha indotto i ricercatori coinvolti nella gestione dei dati a strutturare reti informatiche. La prima rete Bioinformatica è stata EMBnet (European Molecular Biology Network) costituita nel 1988 su iniziativa del laboratorio Europeo di Biologia Molecolare di Heidelberg (GE) da parte di dodici centri di differente nazionalità europee afferenti al laboratorio stesso. La rete EMBnet rappresenta il primo modello di 'Laboratorio di Bioinformatica distribuito e senza muri'. La finalità di EMBnet è quella di sostenere e far progredire la ricerca nel settore della biologia molecolare e della biotecnologia, nel senso più ampio del termine, attraverso gli sforzi combinati dei rappresentanti di ciascun nodo EMBnet, i quali offrono le loro specifiche competenze a supporto della comunità scientifica. La rete EMBnet attualmente è costituita da trentacinque nodi europei ed extraeuropei. I nodi sono a loro volta classificati in: Nodi Nazionali e Speciale. I nodi nazionali sono centri di bioinformatica, nominati dall’autorità governativa del proprio paese, i quali hanno il compito di fornire alla comunità scientifica accademica e industriale accesso a banche dati di biosequenze e programmi d’analisi, e organizzare corsi di formazione orientati all’utilizzo degli strumenti Bioinformatici. I nodi speciali sono centri di bioinformatica che possiedono delle forti competenze negli aspetti legati allo sviluppo di banche dati di biosequenze e di programmi d’analisi. Nodo Nazionale Italiano EMBnet Il gruppo di Bioinformatica e Genomica, dell’Area di Ricerca CNR di Bari, è responsabile del nodo nazionale Italiano. Il nodo nazionale mette a disposizione dell’utenza, costituita da numerosi laboratori universitari, centri di ricerca pubblici e privati, banche dati primarie di biosequenze (Acidi Nucleici, Proteine), banche dati specializzate e programmi per l’analisi funzionale. Le analisi che i ricercatori possono condurre utilizzando i pacchetti e i programmi d’analisi disponibili presso il nodo nazionale EMBnet sono: Ricerca di similarità tra sequenze e banche dati; Allineamento e multiallineamento di biosequenze; Individuazione di regioni codificanti proteine; Ricerca di elementi funzionali funzionali quali promotori, siti di splicing ecc. ; Predizione di strutture secondarie in sequenze di acidi nucleici e proteine. Evoluzione Molecolare Parte degli strumenti bioinformatici, banche dati e programmi di analisi, (Tabella I) messi a disposizione dell’utenza è il risultato delle attività di ricerca del gruppo. Tutti i servizi forniti dal nodo nazionale sono accessibili per via telematica mediante delle connessioni di lavoro interattive ed utilizzando la rete Internet. Il nodo italiano EMBnet, nell'ambito dell'attività di formazione, organizza periodicamente corsi di formazione presso la sede dell'Area di Ricerca o su richiesta presso le sedi degli utenti.

25. Martelli PL, Casadio R
Hidden Markov Models in cascade with neural networks generate a better predictor of segments of protein secondary structures
Meeting: BIOCOMP 2000 - Year: 2000
Full text in a new tab
Topic:

Abstract: Neural networks have been proved to be the most efficient methods for the protein secondary structure prediction. However one of the problems in using neural networks for sequence analysis is the independence of every prediction from the others. This is introducing noise in the results. For example the most evident effect is the presence of helices and strands one-residue long in the prediction of protein secondary structure. Since the shortest helical stretch is 3-residue long (if the 310 helices are included in this class) and the shortest strand is 2-residue long, these predictions can be regarded as 'syntax errors'. As a consequence, the length distributions of the segments of secondary structure predicted by Neural Networks is quite different from those of helices, strands and coil extracted from the atomic-resolved structures of the PDB database. In this work we propose a new kind of filter, based on Hidden Markov Models. We take advantage of their capabilities in capturing the duration of phenomena. For instance it is possible to include in HMMs a 'minimum length' constraint, in order to avoid the single residue predictions of helices and beta-strands. Our tool consists of a 6-states HMM: 3 states are labeled as helical states, 2 as strand and 1 as coil, following the minimal length observed in the database for these structural types. The transition probabilities among the states are then computed from the databases of known structures. Every residue along the sequence is emitted by each state with probability values equal to the outputs of the Neural Network predictor. The prediction of the HMM is given by the Viterbi-decoding, namely by the computation of the most probable path through the states of the model, given the Neural Network outputs. We prove that the filter doesn't affect the prediction efficiency of the Neural Network (72 % of overall accuracy when three structural states are discriminated). However its application improves considerably the length distributions of the predicted structures as compared to a Neural Network based filter previously adopted.

26. Milanesi L, Rogozin I, Rizzi R
Application of ESTs mapping to improve gene prediction methods
Meeting: BIOCOMP 2000 - Year: 2000
Full text in a new tab
Topic: Sequence analysis

Abstract: Prediction of protein-coding genes in newly sequenced DNA becomes very important in large genome sequencing projects. These problems are complicated due to exon-intron of the eukaryotic genes. Currently existing collections of expressed sequence tags (ESTs) are very large and thus very useful for gene mapping. Gene identification in the newly-discovered DNA sequences is an important problem in current molecular biology studies. A number of programs have been developed for predicting the protein coding genes. The most common approach is based on the combination of the potential functional signals with global statistical properties of protein coding regions. Another approach for gene structure prediction is based on the homology detection throughout the databases of nucleotide or amino acid sequences. By using the information available on homologous protein sequences, it is possible to significantly improve the accuracy of gene structure prediction. Currently existing collections of expressed sequence tags (ESTs) are very large and can be very useful for gene mapping. Homology searches against the EST Division of GenBank (dbEST) and Unigene database can be used for this purpose. ESTs (Expressed Sequence Tags) offer a rapid route to gene identification (Adams, et al, 1991, Adams , et al, 1992), analysis of expression and regulation data, and can highlight multigene family diversity and gene alternative splicing). EST matches may identify more than half of the known human genes (Hillier et al, 1996). The price of the high-volume and high-throughput nature of the data, however, is that ESTs contain high error rates (Aaronson, et al 1996), do not have a defined protein product, are not well annotated and present only a raw substrate for sequence matching. The ESTMAP system involves the following procedures: 1) Repeat masking. The repeated elements (for example, the human Alu elements) can be automatically masked in a query sequence before the homology search. Homology searches against the collection of repeated element (Jurka et al., 1992) are used for repeats detection. We implemented a program REPEAT for that purpose. A censored sequence (with 'N's instead of repeated elements) is automatically produced by REPEAT. 2) Homology searches. BLASTN (Altschul et al. 1990) is used for homology searches of the censored query sequence against the EST Division of GenBank (dbEST) and the Unigene database of sequences (www. ncbi.nlm.nih.gov) This step is most time-consuming since these EST datsets are very large. 3) EST mapping. The BLASTN output is used as input information by a EST_GENE program. Information about an EST sequence is used only when the similarity between the EST sequence and the query sequence is greater then 95%. The module EST_GENE is also able to predict the introns in DNA comparing ESTs and a query sequence based on the alignment method suggested by Huang (1994) (a linear-space divide-and-conquer strategy). The GT/AG splicing sites rule is used by EST_GENE, however non-canonical splicing signals (Milanesi and Rogozin, 1998) can also be predicted in cases of unambiguous alignment. 4) Output of results. The graphical visualization of the results is particularly important for the analysis of alternative splicing in a query sequence. By using a Java based graphical interface the user can visualize the EST maps and the sequence pattern of predicted features. Homology searches are very important for functional mapping, homology with a known functional region can suggest the function of a query sequence. In particular, when the homologous protein sequence is already known and EST matches are detected, then the gene structure can be reconstructed with high accuracy. Information about EST matches is automatically used by the GeneBuilder system (Milanesi et al., 1999). Acknowledgment This work was supported by Italian CNR Genetic Engineering Project

27. Parisi V, Aluffi-Pentini F, Bersani E, De Fonzo V
La cellula virtuale: dalla genomica alla proteomica. Un nuovo algoritmo di integrazione di equazioni differenziali.
Meeting: BIOCOMP 2000 - Year: 2000
Full text in a new tab
Topic:

Abstract: Per la problematica biologica facciamo riferimento all’abstract di Enrico Bersani, ed esaminiamo qui solo quello matematico-numerico. Limitandosi a problemi adimensionali, ossia trascurando lo spazio (trasporto e diffusione), i sistemi di equazioni differenziali nonlineari che regolano il tasso della cinetica delle reazioni chimiche (sia organiche che inorganiche) sono di tipo polinomiale [Gavalas 68], e possono sempre essere ridotti alla forma di sistemi di Riccati [Kerner 81], ossia la forma tipica delle equazioni di Lotka-Volterra [Lotka 25][Volterra 26-31] che segnano la nascita della biomatematica moderna. Qualora si sia interessati solamente alla soluzione asintotica, il problema numerico diventa la soluzione di un sistema simultaneo di equazioni algebriche del second'ordine, che puo' essere risolto o tramite un qualsiasi buon algoritmo generico [Aluffi-Pentini et al. 84] o con algoritmi specializzati [Cox & Sturmfels 97]. Se invece, come spesso capita, si e' interessati all'andamento temporale, o per quantificare la durata di un transiente o per studiare autooscillazioni, diventa necessaria l'integrazione numerica del sistema. Nell'integrazione numerica di equazioni differenziali si va incontro a tre ordini di problemi: arrotondamento, nonlinearita', malcondizionamento. Per quano riguarda l'arrotondamento numerico esistono tecniche ad hoc per evitare l'accumulo pernicioso degli errori di troncamento [Knuth 81]. Per quanto riguarda la nonlinearita' delle equazioni differenziali, essa, di tipo quadratico, e' la piu' semplice possibile e qualsiasi algoritmo di integrazione riesce a controllarla agevolmente. Il malcondizionamento invece, concetto la cui origine segna la nascita dell'analisi numerica moderna [Von Neumann & Goldstine 47] [Turing 48], e' l'ostacolo principale da affrontare. In maniera incompleta ed imperfetta un problema e' tanto piu' malcondizionato quando piu' e' grande il rapporto tra la scala dei tempi dei processi piu' lenti rispetto a quella dei piu' rapidi. Ad esempio, se studiamo l'invecchiamento di un neurone, i processi piu' lenti sono le modifiche del suo DNA, mentre i processi piu' rapidi sono probabilmente la protezione dai radicali liberi tramite la superossidismutasi. Gli algoritmi di integrazione espliciti classici (Eulero, Runge-Kutta, ecc.) sono inusabili nei problemi molto mal condizionati: inevitabilmente richiedono un costo computazionale astronomico o danno luogo ad instabilità. In tempi recenti e' stata introdotta una nuova classe di algoritmi di integrazione L&S [Lambert & Sigurdsson 1972] linearmente impliciti ma nonlinearmente espliciti, che godono tutti della proprieta' dell' A-stabilita' [Dahlquist 63], la quale consente di integrare le equazioni in tempi di calcolo ragionevoli senza pericolo di instabilita' fittizie. Sfortunatamente gli algoritmi finora proposti di tipo L&S non riescono ad integrare in maniera soddisfacente, dal punto di vista qualitativo e quantitativo, le equazioni tipiche che si incontrano simulando i processi biochimici, neanche nei casi lineari; consideriamo i due algoritmi tipici di tipo L&S: il backward-Eulero linearizzato e l'Eulero-Cauchy linearizzato: il primo integra in maniera soddisfaciente le equazioni di rilassamento, ma introduce uno smorzamento fittizio in quelle oscillanti; in maniera complementare il secondo tratta correttamente i fenomeni oscillanti ma introduce oscillazioni fittizie anche dove non dovrebbe. Anche gli altri algoritmi finora proposti soffrono di una o l'altra di queste patologie. L'algoritmo che il nostro gruppo propone si ottiene portando alle ultime conseguenze l'idea di L&S, e' di tipo nonlinearmente esplicito ma linearmente esponenziale, contiene in sè la correzione automatica all'accumulo degli arrotondamenti numerici, non ha un costo computazionale eccessivo, non soffre delle patologie sopra esposte, non e' minimamente disturbato dal malcondizionamento, nei casi puramente lineari da' l'esatta soluzione analitica, in quelli quasi lineari e' incredibilmente preciso, mentre potrebbe incontrare qualche difficolta' solo nei casi nei quali la componente nonlineare e' dominante. Per tali motivi riteniamo che per simulare la cellula virtuale (senza l'introduzione della struttura spaziale) il nostro algoritmo sia decisamente da preferirsi.

28. Pastore A, Temussi PA
Predicting the glucophores of sweet proteins
Meeting: BIOCOMP 2000 - Year: 2000
Full text in a new tab
Topic: Proteins analysis and structure prediction

Abstract: Taste receptors have been studied less than those of other stimuli. However, the availability of many agonists and the practical relevance of sweeteners have stimulated indirect studies of the interaction of sweet agonists with their receptor and the development of general models of the sweet receptor active site. Most sweeteners are small molecular weight compounds but there are also sweet macromolecules, both synthetic and natural, i.e., sweet proteins. Do they interact with the same receptor of low molecular weight compounds? There are several sweet proteins: miraculin , monellin, thaumatin, curculin, mabinlin, pentadin and brazzein, but only three of them, i.e. thaumatin, monellin and brazzein, have been studied from a structural point of view. Multiple alignment of the sequences of sweet proteins shows no similarity. There is also no obvious similarity among the structures of thaumatin, monellin and brazzein. How can we identify the protein glucophores? We made the assumption that they are similar to those of low molecular weight compounds and that all sweet compounds interact with the same receptor. In fact, our model for the sweet receptor (Temussi et al., 1984, 1991) is consistent also with macromolecules since the active site is depicted as an open cavity with a flat bottom. When trying to explain the sweet taste of a protein it's natural to assume the existence of some kind of "sweet finger", i.e., a protruding structural element hosting one or more glucophores. We sought to identify sweet fingers in the three sweet proteins whose structure is known. Detailed structure comparison of all loops in the structures of thaumatin, monellin and brazzein by means of DALI shows that each protein hosts a likely sweet finger in which the spatial arrangement of three key residues (an aromatic a hydrogen bond donor and a hydrogen bond acceptor) is consistent with our model of the receptor active site.

29. Pesole G, Gissi C, Grillo G, Licciulli F, Larizza A, Liuni S
Structural and evolutionary analysis of eukaryotic mRNA untranslated regions
Meeting: BIOCOMP 2000 - Year: 2000
Full text in a new tab
Topic: Databanks

Abstract: The 5’ and 3’ untranslated regions of eukaryotic mRNAs may play a crucial role in the regulation of gene expression controlling mRNA localization, stability and translation efficiency. In order to study the general structural and compositional features of these sequences we have previously developed UTRdb, a specialized database of 5’ and 3’ UTR sequences of eukaryotic mRNAs cleaned from redundancy (Pesole, Liuni et al. 2000) . Utrdb (release 10.0) contains 75,448 entries (26,145,985 nucleotides) which are also annotated for the presence of functional sequence patterns whose biological activity has been experimentally demonstrated. All these patterns have been collected in the UTRsite database where for each functional pattern, corresponding to a specific entry, the consensus structure is reported with a short description of its biological activity and the relevant bibliography. Furthermore, UTRdb entries have been annotated for the presence of repeated elements present in the Repbase database (Jurka 1998) . A total of 5,818 functional elements and 54,975 repetitive elements are annotated in UTRdb. All Web resources we implemented for the retrieval and the analysis of UTR sequences are available at the UTR home page (Pesole and Liuni 1999b) we recently implemented. UTRdb entries can be retrieved through the SRS system where crosslinks to UTRsite as well as to the nucleotide or aminoacid primary database are also established. Through the Web facility UTRscan any input sequence can be searched for the presence of a functional pattern annotated in UTRsite and UTRfasta allows to assess sequence similarity between a query sequence and UTRdb entries. The analysis of complete UTR sequences contained in this database showed that 5’-UTR sequences, on the average 187 nucleotides long, were 1,2 to 4,3 times shorter than the corresponding 3’-UTR sequences in the various taxonomic groups considered. As far as the compositional properties were concerned, on average 5’-UTR sequences resulted in all cases GC richer than 3’-UTR sequences and significant correlation was found between the GC content of 5’ and 3’-UTR sequences and the GC content of the third silent codon positions of the corresponding protein coding genes (Pesole, Liuni et al. 1997) . Some structural features of 5'UTRs were investigated, such as presence of upstream ORFs and context of initiator ATG, which are known to affect the mRNA translation efficiency (Pesole, Bernardi et al. 1999) . In order to assess the level of functional constraint of UTR sequences we have studied their evolutionary dynamics also in comparison with the corresponding coding regions. With suitable evolutionary models we have calculated the nucleotide substitution rate of 5’-UTR, 3’-UTR, synonymous and asynonymous positions by comparing complete human, murid (rat and mouse) and artiodactyl mRNAs, for which a suitable number of orthologous sequences was available.

30. Pizzi E
Analisi dei domini a sequenza semplice nelle proteine di Plasmodium falciparum
Meeting: BIOCOMP 2000 - Year: 2000
Full text in a new tab
Topic: Sequence analysis

Abstract: Le proteine di P. falciparum mostrano una caratteristica decisamente peculiare: quando vengono confrontate con le proteine omologhe di altri organismi quasi sempre presentano lunghe porzioni di sequenza che separano domini ben conservati. E' stata effettuata una prima analisi sulla gamma-glutamil-cisteina sintetasi di cui sono note le sequenze in due specie differenti di Plasmodio (P. falciparum e P. berghei) (ref.1). I risultati hanno permesso di stabilire che pur mantenendo un carattere essenzialmente idrofilico, le porzioni centrali di queste inserzioni sono caratterizzate da un uso ripetuto di alcuni amminoacidi (zone semplici) e tendono a mutare piu' rapidamente dei loro "bordi". Allo scopo di effettuare una caratterizzazione di tali sequenze sono state prese in considerazione ed analizzate tutte le proteine presenti nel cromosoma 2. Circa l'88% delle proteine esaminate presenta domini a sequenza semplice che vengono comunemente considerati come domini non globulari estrusi dal "core" della struttura proteica senza alcuna funzione nota per la proteina. Abbiamo effettuato una prima analisi statistica su tutte queste regioni considerando la loro distribuzione in lunghezza, la loro distribuzione lungo la sequenza proteica, il numero di inserzioni per proteina, la composizone amminoacidica, la presenza di "tandem repeats" e il carattere prevalentemente idrofilico o idrofobico.

31. Via A, Ferrè F, Brannetti B, Helmer-Citterich M
Profili 3D: analisi di superificie proteiche per l'identificazione di determinanti funzionali
Meeting: BIOCOMP 2000 - Year: 2000
Full text in a new tab
Topic: Proteins analysis and structure prediction

Abstract: Abbiamo sviluppato una procedura (de Rinaldis et al., 1998) per calcolatore che consente di analizzare e confrontare superficie proteiche che siano associate a specifiche funzioni (es.: particolari abilita' di legame o di catalisi). Sulla base di un allineamento multiplo di strutture proteiche omofunzionali ed in analogia con il metodo dei profili per la ricerca di omologie di sequenza (Gribskov et al., 1987), il programma permette di realizzare un profilo 3D che puo' essere usato per fare ricerche nel PDB e selezionare proteine con particolari caratteristiche di superficie e funzionali. L'analisi del profilo 3D consente l'individuazione di residui conservati sulla superficie delle proteine sovrapposte e la definizione di determinanti strutturali associati a particolari funzioni biologiche. Questo metodo puo' inoltre essere utilizzato: i) per identificare strutture proteiche che, pur avendo fold differenti, abbiano una o piu' regioni di superficie con proprieta' chimiche e funzionali simili (evoluzione convergente); ii) come sistema esperto per la mutagenesi sito-specifica o iii) per il protein design. Stiamo utilizzando i profili 3D per analizzare i determinanti di superficie legati a sei motivi del database PROSITE: aminoacyl-transfer RNA sintetasi di classe II, perossidasi , dominio EF-hand di legame al calcio, sito di legame all'eme della famiglia citocromo c, sito di legame a nucleotidi del p loop, sito attivo delle aspartil proteasi eucariotiche e virali. Tali motivi di sequenza non sono in grado di riconoscere tutte e sole le sequenze proteiche associate alla loro funzione, ma selezionano anche falsi positivi e talvolta non selezionano tutte le sequenze cui e' associata la funzione loro assegnata. Nostro scopo e' indagare se invece esista e possa essere identificato un motivo di superficie specifico per ognuna delle funzioni analizzate. Nel caso del p loop, l'analisi dei risultati ha permesso considerazioni di particolare interesse biologico. L'applicazione di questo metodo all'intero database di strutture note potrebbe consentire la realizzazione di una banca dati di motivi di superficie capaci di identificare tutte e sole le strutture associate ad una specifica funzione, in quei casi in cui l'analisi della sola sequenza non lo consentirebbe. Lo sviluppo dei progetti di "structural genomics" rende molto piu' vasto l'insieme delle proteine cui applicare il metodo.

BITS Meetings' Virtual Library
driven by Librarian 1.3 in PHP, MySQL^TM and Apache environment.

For information, email to paolo.dm.romano@gmail.com .